Goto

Collaborating Authors

 transferability estimation



TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

arXiv.org Artificial Intelligence

Recent advances in Vision Transformers (ViTs) have significantly advanced semantic segmentation performance. However, their adaptation to new target domains remains challenged by distribution shifts, which often disrupt global attention mechanisms. While existing global and patch-level adaptation methods offer some improvements, they overlook the spatially varying transferability inherent in different image regions. To address this, we propose the Transferable Mask Transformer (TMT), a region-adaptive framework designed to enhance cross-domain representation learning through transferability guidance. First, we dynamically partition the image into coherent regions, grouped by structural and semantic similarity, and estimates their domain transferability at a localized level. Then, we incorporate region-level transferability maps directly into the self-attention mechanism of ViTs, allowing the model to adaptively focus attention on areas with lower transferability and higher semantic uncertainty. Extensive experiments across 20 diverse cross-domain settings demonstrate that TMT not only mitigates the performance degradation typically associated with domain shift but also consistently outperforms existing approaches.



How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation

arXiv.org Artificial Intelligence

Transferability estimation metrics are used to find a high-performing pre-trained model for a given target task without fine-tuning models and without access to the source dataset. Despite the growing interest in developing such metrics, the benchmarks used to measure their progress have gone largely unexamined. In this work, we empirically show the shortcomings of widely used benchmark setups to evaluate transferability estimation metrics. We argue that the benchmarks on which these metrics are evaluated are fundamentally flawed. We empirically demonstrate that their unrealistic model spaces and static performance hierarchies artificially inflate the perceived performance of existing metrics, to the point where simple, dataset-agnostic heuristics can outperform sophisticated methods. Our analysis reveals a critical disconnect between current evaluation protocols and the complexities of real-world model selection. To address this, we provide concrete recommendations for constructing more robust and realistic benchmarks to guide future research in a more meaningful direction.


Estimating Time Series Foundation Model Transferability via In-Context Learning

arXiv.org Artificial Intelligence

Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training, yet fine-tuning remains critical for boosting performance in domains with limited public data. With the growing number of TSFMs, efficiently identifying the best model for downstream fine-tuning becomes increasingly challenging. Leveraging the natural tabular structure formed by dataset meta-features, model characteristics, and fine-tuned performance, we employ tabular foundation models to serve as in-context learners. We establish a comprehensive benchmark for transferability estimation including 10 datasets, 10 foundation models, and 3 forecasting tasks. 's estimation demonstrates strong alignment with actual fine-tuned performance for previously unseen datasets, achieving a mean rank correlation of approximately 0.6 and a 30% improvement compared to using zero-shot performance as the transferability score. The emergence of time series foundation models (TSFMs) is reshaping the paradigm of time series forecasting (Liang et al., 2025) through their strong zero-shot capabilities. Although efficient and cost-effective, zero-shot inference often underperforms in out-of-distribution scenarios, particularly in domains with limited public data, such as healthcare (Gupta et al., 2024) and finance (Fu et al., 2024). Fine-tuning helps bridge the gap by transferring generalized knowledge from large-scale pre-training to specific, resource-limited downstream tasks (Li & Zhu, 2025). However, due to the inherent diversity of time series data, no single model consistently outperforms others in all scenarios (Brigato et al., 2025).


Analysis of Transferability Estimation Metrics for Surgical Phase Recognition

arXiv.org Artificial Intelligence

Fine-tuning pre-trained models has become a cornerstone of modern machine learning, allowing practitioners to achieve high performance with limited labeled data. In surgical video analysis, where expert annotations are especially time-consuming and costly, identifying the most suitable pre-trained model for a downstream task is both critical and challenging. Source-independent transferability estimation (SITE) offers a solution by predicting how well a model will fine-tune on target data using only its embeddings or outputs, without requiring full retraining. In this work, we formalize SITE for surgical phase recognition and provide the first comprehensive benchmark of three representative metrics, LogME, H-Score, and TransRate, on two diverse datasets (RAMIE and AutoLaparo). Our results show that LogME, particularly when aggregated by the minimum per-subset score, aligns most closely with fine-tuning accuracy; H-Score yields only weak predictive power; and TransRate often inverses true model rankings. Ablation studies show that when candidate models have similar performances, transferability estimates lose discriminative power, emphasizing the importance of maintaining model diversity or using additional validation. We conclude with practical guidelines for model selection and outline future directions toward domain-specific metrics, theoretical foundations, and interactive benchmarking tools.


Benchmarking Transferability: A Framework for Fair and Robust Evaluation

arXiv.org Artificial Intelligence

Transferability scores aim to quantify how well a model trained on one domain generalizes to a target domain. Despite numerous methods proposed for measuring transferability, their reliability and practical usefulness remain inconclusive, often due to differing experimental setups, datasets, and assumptions. In this paper, we introduce a comprehensive benchmarking framework designed to systematically evaluate transferability scores across diverse settings. Through extensive experiments, we observe variations in how different metrics perform under various scenarios, suggesting that current evaluation practices may not fully capture each method's strengths and limitations. Our findings underscore the value of standardized assessment protocols, paving the way for more reliable transferability measures and better-informed model selection in cross-domain applications. Additionally, we achieved a 3.5% improvement using our proposed metric for the head-training fine-tuning experimental setup. Our code is available in this repository: https://github.com/alizkzm/


Feature Space Perturbation: A Panacea to Enhanced Transferability Estimation

arXiv.org Artificial Intelligence

Most existing metrics primarily focus on identifying the statistical relationship between feature embeddings and the corresponding labels within the target dataset, but overlook crucial aspect of model robustness. This oversight may limit their effectiveness in accurately ranking pre-trained models. T o address this limitation, we introduce a feature perturbation method that enhances the transferability estimation process by systematically altering the feature space. Our method includes a Spread operation that increases intra-class variability, adding complexity within classes, and an Attract operation that minimizes the distances between different classes, thereby blurring the class boundaries. Through extensive experimentation, we demonstrate the efficacy of our feature perturbation method in providing a more precise and robust estimation of model transferability. Notably, the existing LogMe method exhibited a significant improvement, showing a 28.84 % increase in performance after applying our feature perturbation method. The implementation is available at https://github.


Occam's model: Selecting simpler representations for better transferability estimation

arXiv.org Artificial Intelligence

Fine-tuning models that have been pre-trained on large datasets has become a cornerstone of modern machine learning workflows. With the widespread availability of online model repositories, such as Hugging Face, it is now easier than ever to fine-tune pre-trained models for specific tasks. This raises a critical question: which pre-trained model is most suitable for a given task? This problem is called transferability estimation. In this work, we introduce two novel and effective metrics for estimating the transferability of pre-trained models. Our approach is grounded in viewing transferability as a measure of how easily a pre-trained model's representations can be trained to separate target classes, providing a unique perspective on transferability estimation. We rigorously evaluate the proposed metrics against state-of-the-art alternatives across diverse problem settings, demonstrating their robustness and practical utility. Additionally, we present theoretical insights that explain our metrics' efficacy and adaptability to various scenarios. We experimentally show that our metrics increase Kendall's Tau by up to 32% compared to the state-of-the-art baselines.


Selection, Ensemble, and Adaptation: Advancing Multi-Source-Free Domain Adaptation via Architecture Zoo

arXiv.org Artificial Intelligence

Conventional Multi-Source Free Domain Adaptation (MSFDA) assumes that each source domain provides a single source model, and all source models adopt a uniform architecture. This paper introduces Zoo-MSFDA, a more general setting that allows each source domain to offer a zoo of multiple source models with different architectures. While it enriches the source knowledge, Zoo-MSFDA risks being dominated by suboptimal/harmful models. To address this issue, we theoretically analyze the model selection problem in Zoo-MSFDA, and introduce two principles: transferability principle and diversity principle. Recognizing the challenge of measuring transferability, we subsequently propose a novel Source-Free Unsupervised Transferability Estimation (SUTE). It enables assessing and comparing transferability across multiple source models with different architectures under domain shift, without requiring target labels and source data. Based on above, we introduce a Selection, Ensemble, and Adaptation (SEA) framework to address Zoo-MSFDA, which consists of: 1) source models selection based on the proposed principles and SUTE; 2) ensemble construction based on SUTE-estimated transferability; 3) target-domain adaptation of the ensemble model. Evaluations demonstrate that our SEA framework, with the introduced Zoo-MSFDA setting, significantly improves adaptation performance (e.g., 13.5% on DomainNet). Additionally, our SUTE achieves state-of-the-art performance in transferability estimation.